co 1
- Asia > Afghanistan > Parwan Province > Charikar (0.04)
- South America > Brazil > Rio de Janeiro > Rio de Janeiro (0.04)
- Europe > Spain (0.04)
- Asia > India > Gujarat > Gandhinagar (0.04)
- North America > United States > California > Santa Clara County > Stanford (0.04)
- North America > United States > California > Santa Clara County > Palo Alto (0.04)
- Asia > China > Beijing > Beijing (0.04)
- North America > Canada > Alberta (0.14)
- Asia > Singapore (0.04)
- North America > United States > Massachusetts > Suffolk County > Boston (0.04)
- North America > United States > California (0.04)
- Transportation (0.46)
- Banking & Finance (0.46)
- Asia > Afghanistan > Parwan Province > Charikar (0.04)
- South America > Brazil > Rio de Janeiro > Rio de Janeiro (0.04)
- Europe > Spain (0.04)
- Asia > India > Gujarat > Gandhinagar (0.04)
Decipher the Modality Gap in Multimodal Contrastive Learning: From Convergent Representations to Pairwise Alignment
Yi, Lingjie, Douady, Raphael, Chen, Chao
Multimodal contrastive learning (MCL) aims to embed data from different modalities in a shared embedding space. However, empirical evidence shows that representations from different modalities occupy completely separate regions of embedding space, a phenomenon referred to as the modality gap. Moreover, experimental findings on how the size of the modality gap influences downstream performance are inconsistent. These observations raise two key questions: (1) What causes the modality gap? (2) How does it affect downstream tasks? To address these questions, this paper introduces the first theoretical framework for analyzing the convergent optimal representations of MCL and the modality alignment when training is optimized. Specifically, we prove that without any constraint or under the cone constraint, the modality gap converges to zero. Under the subspace constraint (i.e., representations of two modalities fall into two distinct hyperplanes due to dimension collapse), the modality gap converges to the smallest angle between the two hyperplanes. This result identifies \emph{dimension collapse} as the fundamental origin of the modality gap. Furthermore, our theorems demonstrate that paired samples cannot be perfectly aligned under the subspace constraint. The modality gap influences downstream performance by affecting the alignment between sample pairs. We prove that, in this case, perfect alignment between two modalities can still be achieved via two ways: hyperplane rotation and shared space projection.
- Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.14)
- North America > United States > New York > Suffolk County > Stony Brook (0.04)
- North America > United States > California > Santa Clara County > Stanford (0.04)
- North America > United States > California > Santa Clara County > Palo Alto (0.04)
- Asia > China > Beijing > Beijing (0.04)
A Details of the empirical setup in Section 3.4
Our model is one of the simplest possible that studies specialization in the supply-side marketplace. First, the infinite, high-dimensional content embedding space captures that digital goods can't be cleanly clustered into categories, but rather, are often mixtures of different dimensions (e.g. a movie can be both a drama and a comedy). See Anderson et al. [ 1992 ] for a textbook treatment. The assumption that all producers share the same cost function is also simplifying, but, potentially surprisingly, still allows us to study specialization. Proposition 4. F or any set of users and any 1, a pure strategy equilibrium does not exist.
- North America > Canada > Alberta (0.14)
- Asia > Singapore (0.04)
- North America > United States > Massachusetts > Suffolk County > Boston (0.04)
- North America > United States > California (0.04)
- Transportation (0.46)
- Banking & Finance (0.46)
E-Gen: Leveraging E-Graphs to Improve Continuous Representations of Symbolic Expressions
Zheng, Hongbo, Wang, Suyuan, Gangwar, Neeraj, Kani, Nickvash
As vector representations have been pivotal in advancing natural language processing (NLP), some prior research has concentrated on creating embedding techniques for mathematical expressions by leveraging mathematically equivalent expressions. While effective, these methods are limited by the training data. In this work, we propose augmenting prior algorithms with larger synthetic dataset, using a novel e-graph-based generation scheme. This new mathematical dataset generation scheme, E-Gen, improves upon prior dataset-generation schemes that are limited in size and operator types. We use this dataset to compare embedding models trained with two methods: (1) training the model to generate mathematically equivalent expressions, and (2) training the model using contrastive learning to group mathematically equivalent expressions explicitly. We evaluate the embeddings generated by these methods against prior work on both in-distribution and out-of-distribution language processing tasks. Finally, we compare the performance of our embedding scheme against state-of-the-art large language models and demonstrate that embedding-based language processing methods perform better than LLMs on several tasks, demonstrating the necessity of optimizing embedding methods for the mathematical data modality.
- Asia > Japan > Honshū > Kantō > Tokyo Metropolis Prefecture > Tokyo (0.14)
- North America > United States > Illinois (0.04)
- Asia > Middle East > Jordan (0.04)
- (3 more...)